Skip to content

Conversation

AA-Turner
Copy link
Member

Currently, proofread_canonicals() takes c. 4-5 minutes, with the vast majority of time spent reading the files from disk. This PR improves performance to c. 100-120 seconds by using multiple threads to check the files. We also switch to byte methods over re for another slight improvement, avoiding Unicode encoding/decoding.

A

@AA-Turner AA-Turner requested a review from hugovk April 11, 2025 04:27
@AA-Turner AA-Turner merged commit e80b729 into main Apr 11, 2025
6 checks passed
@AA-Turner AA-Turner deleted the proofread-perf branch April 11, 2025 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants